Extracting Collocations from Text Corpora

نویسنده

  • Dekang Lin
چکیده

A collocation is a habitual word combination. Collocational knowledge is essential for many tasks in natural language processing. We present a method for extracting collocations from text corpora. By comparison with the SUSANNE corpus, we show that both high precision and broad coverage can be achieved with our method. Finally, we describe an application of the automatically extracted collocations for computing word similarities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is pr...

متن کامل

Retrieving Collocations by Co-occurrences and Word Order Constraints

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially...

متن کامل

Extracting collocations and their translations from parallel corpora

Identifying collocations in a text (e.g., break record) and correctly translating them (battre record vs. *casser record) represent key issues in machine translation, notably because of their prevalence in language and their syntactic flexibility. This article describes a method for discovering translation equivalents for collocations from parallel corpora, aimed at increasing the lexical cover...

متن کامل

Extracting Verb-Noun Collocations from Text

In this paper, we describe a new method for extracting monolingual collocations. The method is based on statistical methods extracts. VN collocations from large textual corpora. Being able to extract a large number of collocations is very critical to machine translation and many other application. The method has an element of snowballing in it. Initially, one identifies a pattern that will prod...

متن کامل

Extracting Bilingual Collocations from Non-Aligned Parallel Corpora

This paper proposes a new method to find correspondences of uninterrupted collocations from Japanese-English bilingual corpora without sentence-to-sentence alignment. Uninterrupted collocations in English such as “once again”, “give up”, or “gross national product” handled as a single word or a compound word in Japanese, can be automatically extracted with corresponding Japanese words using wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998